Hyperscale switch and method for data packet network switching

ABSTRACT

A hyperscale switch is implemented with a plurality of semiconductor crossbar switching elements connected to one another according to a direct point-to-point electrical mesh interconnect for transceiving data packets between peripheral devices connected to the switch and utilizing a lookup table and network device addressing for reduced switching power.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of co-pending U.S. patentapplication Ser. No. 16/853,496 entitled “HYPERSCALE SWITCH AND METHODFOR DATA PACKET NETWORK SWITCHING” filed on Apr. 20, 2020, which is acontinuation of U.S. patent application Ser. No. 16/357,226 filed onMar. 18, 2019, which issued as U.S. Pat. No. 10,630,606 entitled“System, Method, and Architecture for Data Center Network Switching”,the entire contents of all of which are incorporated herein by referencefor all purposes.

FIELD OF THE INVENTION

The present disclosure relates in general to data packet switchednetwork systems, and more particularly, to a data packet switchmanagement architecture for use in a data center network environment.

BACKGROUND

Cloud computing or processing via “the cloud,” represents the deliveryof on-demand computing resources over a network configuration such asthe internet on a pay-for-use basis. Cloud computing is typicallyimplemented via one or more data centers. A data center is a centralizedlocation rendered with computing resources and crucialtelecommunications, including servers, storage systems, databases,devices, access networks, software and applications. With the explosivegrowth of information technology (IT) and applications requiringheightened security, reliability, and efficient and fast processingtimes, data centers are increasing worldwide in both size and number.Over 8.6 million data centers are estimated worldwide, with such numbersexpected to double every few years. Current projections of the cost topower these data centers is approximately $60 billion per year,amounting to about 8% of current global electricity production.Hyperscale data centers which house such massive computinginfrastructures not only consume massive amounts of energy but alsodischarge significant pollutants into the atmosphere each year,including but not limited to hundreds of millions of tons of carbondioxide (CO₂). Additional problems associated with hyperscale datacenters include thermal heating and cooling requirements for ensuringproper device and system operations, increased capital costs andexpenditures for diesel generators, battery backups, power conversion,cooling, and the like. Further still, size and processing limitationsassociated with semiconductor (e.g. silicon) electronic elements ordevices, and the need for enhanced processing speed and concomitantincrease in utilization of and cost for electricity contribute to theneed for new technical solutions.

Networked storage systems and remote computing systems can be includedin high-density installations, such as rack-mounted environments.However, as the densities of networked storage systems and remotecomputing systems increase, various physical limitations are beingreached. These limitations include density limitations based on theunderlying storage technology as well as computing density limitationsbased on the various physical space requirements for networkinterconnects, in addition to significant space requirements forenvironmental climate control systems.

In addition to the above, these bulk storage systems traditionally havebeen limited in the number of devices that can be included per host.This can be problematic in storage environments where higher capacity,redundancy, and reliability are desired. These shortcomings may beespecially pronounced with the increasing data storage and retrievalneeds in networked, cloud, and enterprise environments. Still further,power dissipation in a switch is directly proportional to the number ofswitch hops needed to traverse integrated circuit devices (andserializers/deserializers or SERDES) for transferring data packets froma source or ingress port of a network connected first peripheral device,to a destination or egress port of a network connected second peripheraldevice. Thus, power requirements and power usage/consumption withinnetwork data packet switches represent significant technological as wellas environmental challenges.

Alternative systems, devices, architectures, apparatuses, and methodsfor overcoming one or more of the above identified shortcomings isdesired.

SUMMARY

Systems, devices, architectures, apparatuses, methods and computerprograms are presented for implementing a network switching apparatusfor communicating data packets from a first switch-connected peripheraldevice, to a second switch-connected peripheral device. It should beappreciated that embodiments of the present disclosure may beimplemented in numerous ways, as will be understood by one of ordinaryskill in the art, and in accordance with such methods, apparatuses,systems, devices, computer programs, and architectures. Severalembodiments are described below.

In one embodiment of the present disclosure, a non-Clos network datapacket switch apparatus for communicating data packets from a firstswitch-connected peripheral device, to a second switch-connectedperipheral device. The apparatus comprises a plurality of semiconductorcrossbar switch elements disposed within a chassis and having aplurality of associated external I/O ports for connecting withcorresponding peripheral devices for transceiving data packets, and aplurality of associated internal I/O ports. A point-to-point electricalmesh interconnect defines a direct electrical connection between oneinternal I/O port of each semiconductor cross bar switch element, andone internal I/O port of each other semiconductor cross bar switchelement. A control processor is configured to maintain a lookup tablemapping of peripheral device connections with corresponding external I/Oports associated with the plurality of semiconductor crossbar switchelements. Each semiconductor crossbar switch element is configured to:responsive to detection of a data packet on one of its external I/Oports, determine a destination semiconductor crossbar switch element forthe data packet and a destination external I/O port of the destinationsemiconductor crossbar switch element, according to the lookup tablemapping and based on an address header of the data packet. On thecondition that the destination semiconductor crossbar switch element isdifferent from the semiconductor crossbar switch element that detectedthe data packet on one of its external I/O ports, that element outputsthe data packet and an indicator of the destination external I/O port,onto one of its internal I/O ports that is connected to the destinationsemiconductor crossbar switch element via the point-to-point electricalmesh interconnect. If the destination semiconductor crossbar switchelement is the same as the source element according to the lookup table,the source element outputs the data packet onto its own destinationexternal I/O port according to the lookup table.

Each semiconductor crossbar switch element is further responsive toreceipt of a data packet and an indicator of a destination external I/Oport on one of its internal I/O ports for outputting the data packet,without the indicator, onto the external I/O port identified by theindicator, to the second switch-connected peripheral device, whereby therouting of data packets from the first switch-connected peripheraldevice, to the second switch-connected peripheral device traverses onlyat most two semiconductor crossbar switch elements.

The semiconductor crossbar switch element comprises at least oneintegrated circuit. The at least one integrated circuit may be one of anASIC and an FPGA.

The point-to-point electrical mesh interconnect may be comprised of atleast one multi-layer stack of electrically interconnected printedcircuit boards. In an embodiment, the at least one multi-layer stack ofelectrically interconnected printed circuit boards may be silicon-free.In an embodiment, the stack of silicon-free electrically interconnectedprinted circuit boards may be via-free. Each semiconductor crossbarswitch element is configurable for one of 10 Gb, 25 Gb, 40 Gb, 50 Gb,and 100 Gb signal line processing.

In an embodiment, the point-to-point electrical mesh interconnect may becomprised of a plurality of discrete internal network transmission linesor wires. In an embodiment, the point-to-point electrical meshinterconnect may be embodied as twinax or coaxial cables. In anembodiment, the point-to-point electrical mesh interconnect may beembodied as a fiber optic PCB design other than electrical/copper PCB.In addition, the embodiment can include modulation schemes to increasetransmission line capacity for network traffic such as PAM or PSK or QAMmodulations.

In an embodiment, the address header is one of a MAC address header andan IP address header.

In an embodiment, the lookup table stores MAC addresses or IP addressescorresponding to connected peripheral devices.

In an embodiment, the network is an Ethernet network.

In an embodiment, further processing includes performing virtual outputqueuing on the data packet transfers from ingress to egress of each ofthe semiconductor crossbar switch elements as part of the crossbarpacket transfer scheduler.

In an embodiment, MAC or IP address headers within a lookup table areobtained and mapping updates are made to map peripheral deviceconnections with corresponding external I/O ports associated with theplurality of semiconductor crossbar switch elements. A master lookuptable may contain the MAC or IP address headers of the peripheraldevices connected with corresponding external I/O ports associated withthe plurality of semiconductor crossbar switch elements and periodicallyupdate corresponding local lookup tables for access by each of thesemiconductor crossbar switch elements.

In an embodiment, a non-Clos network data packet switch method comprisesreceiving, at an external I/O port of a first network semiconductorswitch element electrically connected to a peripheral device, networktraffic data packet to be forwarded to a second peripheral deviceconnected to an external I/O port of one of a plurality of networksemiconductor crossbar switch elements; determining, at the firstnetwork semiconductor crossbar switch element, a destination externalI/O port on which the network traffic data packet is to be forwarded,according to a lookup table mapping peripheral device connections withcorresponding external I/O ports of the plurality of networksemiconductor crossbar switch elements, and according to an addressheader corresponding to the destination peripheral device connected tothe network; prepending to the network traffic data packet an indicatorof the destination external I/O port of the second network semiconductorswitch element; and forwarding the network traffic data packet to thesecond network semiconductor switch element, via a direct point-to-pointelectrical mesh interconnect which defines a direct electricalconnection between one internal I/O port of each semiconductor cross barswitch element, and one internal I/O port of each other semiconductorcross bar switch element. The method further comprises receiving by thesecond network semiconductor crossbar switch element, at its internalI/O port connected to the first network semiconductor crossbar switchelement via the direct point-to-point electrical mesh interconnect, theprepended network traffic data packet; and outputting by the secondnetwork semiconductor crossbar switch element the network traffic datapacket onto the destination external I/O port to the secondswitch-connected peripheral device, whereby the routing of data packetsfrom the first switch-connected peripheral device, to the secondswitch-connected peripheral device traverses only at most twosemiconductor crossbar switch elements.

In an embodiment, a non-Clos data network switch system forcommunicating data packets from a first switch-connected peripheraldevice, to a second switch-connected peripheral device, comprises: achassis; a plurality of line cards housed within the chassis and havingI/O terminals for transceiving data packets; a plurality ofsemiconductor crossbar switch elements, each having external I/O portsin electrical communication with I/O terminals of corresponding ones ofthe plurality of line cards housed within the chassis, for routing datapackets between switch-connected peripheral devices; a control processorconfigured to maintain a lookup table mapping peripheral deviceconnections with corresponding external I/O ports associated with theplurality of semiconductor crossbar switch elements; wherein eachsemiconductor crossbar switch element comprises a forwarding processorconfigured to access the lookup table in response to a data packetreceived at a given external I/O port of the semiconductor cross barswitch element, and route the data packet according to the lookup tableand an address header of the data packet, onto another one of theexternal I/O ports corresponding to a destination one of the pluralityof semiconductor switch elements, via a direct point-to-point electricalmesh interconnect directly connecting each of the plurality ofsemiconductor crossbar switch elements with every other one of thesemiconductor crossbar switch elements; whereby the routing of datapackets from the first switch-connected peripheral device, to the secondswitch-connected peripheral device traverses only at most twosemiconductor crossbar switch elements associated with the line accesscards.

In one embodiment of the present disclosure, a network switchingapparatus for communicating data packets from a first switch-connectedperipheral device, to a second switch-connected peripheral device,comprises a chassis containing a plurality of line cards with each linecard interconnected via a direct point-to-point mesh interconnectpattern or network. A control processor is configured to maintain alookup table mapping peripheral device connections with correspondingI/O ports associated with the plurality of line cards. On each line carda crossbar switching element is configured to enable electricalconnection of any one of the line card I/O ports through the directpoint-to-point electrical mesh interconnect pattern which connects eachof the plurality of line cards with every other one of the line cards,to a corresponding destination port on one of the plurality of lineaccess cards. The switching connection is made in response to detectionof a data packet on an ingress I/O port of a given line card. Throughthe switching element on the line card, the data packet is routed orforwarded over the direct point-to-point electrical mesh interconnectpattern according to the lookup table mapping based on a destinationaddress header of the data packet, whereby transmission of packetsbetween input and output ports of any two line cards and respectivecross bar switch elements occurs in only two hops. In a particularembodiment, each switching element has a direct electrical connection toevery other switching element, and egress transmission lines output fromany switching element are communicated via the electrical meshinterconnect at select differential pair connections for switchingpurposes, with the final port destination prepended to the data packettransmitted from the switching element so that no further processing orroute determination is required on the electrical mesh, and thedownstream line card and switching element. The ingress receive lines onthe switching element (e.g. corresponding to the destination port ordestination peripheral device) receive directly the data packet and passthrough to the destination peripheral port and device. In an embodiment,differential pairs or alternative electrical/optical transmissionstyles/geometries may be implemented.

According to the architecture of the present disclosure, reduction inthe number of physical hops among line cards or integrated circuits onthe line cards significantly reduces electrical power consumption andsignificantly increases speed, in addition to enhancing thermalefficiency and reducing cooling and power requirements of a networkpacket switch.

In one embodiment of the present disclosure, a hyperscale switch isimplemented within a plurality of silicon switching elements, at leastone per line card arranged within a rack mount or chassis, each siliconswitching element including a routing and forwarding engine for use witha network address identifier such as a media access control (MAC)network address received in a data packet at an I/O port, fortransmission to a destination I/O port connected to a peripheral device,wherein an electrical mesh interconnect network architecture providesdirect point-to-point connections with each of the corresponding I/Oports on each silicon switching element, within an Ethernet packetrouting network configuration.

In another embodiment, the hyperscale switch is implemented with ahypervisor to create and run one or more virtual machines and/orvirtualized workloads in accordance with a select network operatingsystem. In one embodiment, the network operating system may be an opensource network operating system such as Openflow, or a full stack(closed) system installed with applications running natively in theoperating system.

In one embodiment, the direct point-to-point mesh electricalinterconnect pattern or network is implemented as a printed circuitbackplane comprising multi-gigabit transmission lines with directpoint-to-point electrical connections.

In one embodiment, the printed circuit backplane electrical interconnectnetwork is achieved such that the backplane of the device issilicon-free.

In one embodiment, each silicon switching element is configured as anapplication specific integrated circuit (ASIC) or Field programmablegate array (FPGA) device (e.g. chip) and the printed circuit backplanecomprising multi-gigabit copper transmission lines provide directpoint-to-point electrical connections with the integrated circuit (IC)chip I/O connections (or system on a chip or SoC) on each of therespective line cards.

In an embodiment, the network switching platform may be configured as adata center LAN mesh architecture embodiment so as to condensenetworking and provide a gateway for data services while enabling datacenter expansion of its network virtualization presence.

In an embodiment, the network switch platform is designed forversatility in multiple data center networking applications, either as astandalone high capacity switch or access, end of row, core, orinterconnection switch accommodating 10/40/100G optical transceiverswith migration capacity.

Embodiments of the present disclosure include a network data packetswitch comprising a chassis housing a plurality of line cards having I/Oports thereon for connecting to peripheral devices. Each line cardincludes one or more silicon switching elements such as ASICs or FPGAshaving I/O ports for connecting with every other switching elementthrough a printed circuit backplane or p-spine architecture ofpoint-to-point direct electrical interconnections between each of theswitching elements (and hence line cards) within the chassis. Eachsilicon switching element contains therein a forwarding and routingengine for routing data packets according to a packet address headersuch as a MAC header via the printed circuit backplane of point-to-pointdirect electrical interconnections from a source peripheral deviceconnected to the network switch, to a destination peripheral device. Theforwarding and routing is performed within the transmitting ASIC or FPGA(or SoC) according to a lookup table containing routing informationstored thereon.

In distinction to conventional leaf and spine network architectures,embodiments of the present disclosure provide for a line card withsilicon switching element having a forwarding engine co-located on theline card with routing functionality, whereby communications and routinginto/out of the line card and silicon switching element via thepoint-to-point direct electrical interconnection mesh backplane providesreduced serializer/deserializer (SERDES) components and I/O gatewaytolls that increase switch speed or throughput speed, while reducingpower and I/O component requirements.

In an embodiment of the present disclosure, each line card is configuredin a non-Clos packet switching network and includes a plurality ofintegrated circuits which define a fabric cross bar implementation,wherein each integrated circuit on each line card has a direct (i.e.point-to-point) electrical connection via a printed circuit boardbackplane, with every other integrated circuit on every line cardconnected via the printed circuit backplane structure.

In an embodiment of the present disclosure, each line card is configuredin a non-Clos packet switching network and includes a plurality of fieldprogrammable gate array (FPGA) components which define a fabric crossbar implementation, wherein each FPGA on each line card has a direct(i.e. point-to-point) electrical connection via a silicon-free printedcircuit board backplane, with every other FPGA on every line cardconnected via the silicon-free printed circuit backplane structure.

In an embodiment, the FPGA may be replaced and/or integrated withcomponents including one or more processor cores, microprocessors ormicrocontrollers, DSPs, graphics processor (GPU), on chip memory,hardware accelerators, peripheral device functionality such as Ethernetand PCIE controllers, and the like, for implementation as a system on achip (SoC) in connection with communication via the directpoint-to-point electrical interconnect structure.

In an embodiment, the architecture of the direct (i.e. point-to-point)electrical connection interconnect structure connecting each of thesemiconductor cross bar switch elements having integrated within eachsemiconductor cross bar switch element MAC, data packet routing anddisposition, FIFO output queuing and congestion management processing,VLAN, VXLAN, and VOQ functionality, may be implemented as a virtualswitch for execution on a high performance computer server to providefor virtual segmentation, securitization, and reconfiguration.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A and 1B are simplified schematic diagrams of tiered networkswitch architectures according to the prior art.

FIG. 1C is a simplified schematic diagram of a non-Clos network datapacket switch architecture according to an embodiment of the presentdisclosure.

FIG. 2 is a more detailed schematic diagram of a network data packetswitch architecture according to an embodiment of the presentdisclosure.

FIGS. 2A, 2B, 2C, and 2D illustrate exemplary partial and cutaway viewsshowing components of a network data packet switch architectureaccording to an embodiment of the present disclosure.

FIG. 3 is an exemplary illustration of components of a semiconductorcrossbar switch element embodied as an FPGA architecture disposed on aline card with I/O interconnects to a printed circuit backplane ofpoint-to-point direct electrical interconnections between differentsemiconductor switch elements for implementing the data packet networkswitching functionality according to an embodiment of the disclosure.

FIG. 4 is a more detailed illustration of FIG. 3, depicting transmit(egress) signal lines out of a semiconductor switch element tointerconnections on the printed circuit backplane for data packettransmission to a destination device for implementing the network switchfunctionality according to an embodiment of the disclosure.

FIG. 4A is a more detailed illustration of FIG. 3, depicting receive(ingress) signal lines out from the printed circuit backplane to areceiving (ingress) semiconductor switch element for data packetreception at a destination device for implementing the network switchfunctionality according to an embodiment of the disclosure.

FIG. 5 is a schematic diagram illustrating components of switch flowmodule processing associated with a semiconductor switch elementembodied as an FPGA architecture for controlling the network data packettransfer from source to destination peripheral devices according to anembodiment of the disclosure.

FIG. 6A-6B is a process flow illustrating a method of sending a datapacket through a network switch with semiconductor switch elements andpoint-to-point electrical mesh interconnect according to an embodimentof the present disclosure.

FIG. 6C is an exemplary illustration showing fields of a lookup androuting table for processing the data packet transfer from source todestination according to an embodiment of the present disclosure.

FIG. 7A is an exemplary illustration of the point-to-point electricalmesh interconnect structure for providing direct connection betweenintegrated circuits on a plurality of line cards for data packettransfer according to an embodiment of the present disclosure.

FIG. 7B is an exemplary illustration of the point-to-point electricalmesh interconnect structure showing select signal communication linesfor providing direct connection between semiconductor switch elementsdisposed on line cards for data packet transfer according to anembodiment of the present disclosure.

DETAILED DESCRIPTION

It is to be understood that the figures and descriptions of the presentinvention have been simplified to illustrate elements that are relevantfor a clear understanding of the present invention, while eliminating,for purposes of clarity, many other elements found in network switchesand packet switching systems. However, because such elements are wellknown in the art, and because they do not facilitate a betterunderstanding of the present invention, a discussion of such elements isnot provided herein. The disclosure herein is directed to all suchvariations and modifications known to those skilled in the art.

In the following detailed description, reference is made to theaccompanying drawings that show, by way of illustration, specificembodiments in which the invention may be practiced. It is to beunderstood that the various embodiments of the invention, althoughdifferent, are not necessarily mutually exclusive. Furthermore, aparticular feature, structure, or characteristic described herein inconnection with one embodiment may be implemented within otherembodiments without departing from the scope of the invention. Inaddition, it is to be understood that the location or arrangement ofindividual elements within each disclosed embodiment may be modifiedwithout departing from the scope of the invention. The followingdetailed description is, therefore, not to be taken in a limiting sense,and the scope of the present invention is defined only by the appendedclaims, appropriately interpreted, along with the full range ofequivalents to which the claims are entitled. In the drawings, likenumerals refer to the same or similar functionality throughout severalviews.

Although data packet switching networks may take on a number of forms,in one such form a switch fabric may include a card modular platform. Acard modular platform typically comprises a backplane and multipleswitch fabric modules and/or other types of boards, such as servers,routers, telco line cards, storage cards and so on, contained within asingle unit, such as a chassis or shelf, for example, that permits datapacket switching between a plurality of network nodes, thereby formingthe switch fabric between the network nodes.

FIG. 1A illustrates a switch interconnect in the form of a Clos network100 according to the prior art. In general, Clos network 100 includespacket switches having uplink ports that are coupled to uplink ports ofother packet switches. The packet switches also include downlink portsthat may be coupled to hardware resources or peripheral devices (labeledas servers A through H). Such peripheral devices may be implemented notonly as servers but as network interfaces, processors, storage areanetwork interfaces, hosts, and so on. The Clos network switchingarchitecture of the prior art may implement a Clos or folded Closnetwork having a leaf and spine with fabric module architecture as shownin FIG. 1A. Each leaf server or line card device of FIG. 1A isrepresented as element 110 (shown as 110 a-110 d) and the spine withfabric module is represented as element 120 (shown as 120 a-120 d).

The servers are connected to leaf switches (such as top of rack or TORswitches) with each leaf switch connected to all spine switches. Asshown, each peripheral device or server is at least three physical hopsaway from every other peripheral device, as the processing requires datapacket routing from a source peripheral device (e.g. server A) to adestination peripheral device (e.g. server E) through a 3 hop leaf andspine network (e.g. 110 a, 120 b, and 110 c) to reach its destination.The structure may be further expanded to a multi-stage (e.g. 5-stageCLOS) by dividing the topology into clusters and adding an additionalspine layer (also referred to as a super-spine layer). Considering theClos crossbar fabric, and current art implementations, an additionalsemiconductor device operative as a staging module for assessment in theroute for packet forwarding fabric requires each device to be 5 hopsaway from one another. As each hop through a semiconductor devicesuffers from dissipating power (work) through resistance and loss ofthroughput speed traversing through the semiconductor, such a systemexhibits several disadvantageous features. Aspects of the presentdisclosure integrate the crossbar switching functionality, theforwarding and routing, virtual output queuing, VLAN, and control planeintegration within a semiconductor FPGA device, integrated circuit, orSoC, which may be implemented on a line card, in order to achieve theadvantages discussed herein.

FIG. 1B shows another CLOS switching embodiment, including Multi-ChassisLink Aggregation Group (MLAG or MCLAG). Servers may be can be connectedto two different leaf 110′ or TOR 101 switches in order to haveredundancy and load balancing capability. The prior art CLOSarchitecture embodiment of FIG. 1B may utilize both OSI layer 2 (L2)packet switching as well as layer 3 (L3) routing, where packets are sentto a specific next-hop IP address, based on a destination IP address.FIG. 1B shows a CLOS topology using Layer 3 routing for spine 120′ toleaf 110′ connections and multiple TOR as well as leaf switchinstantiations. Similar to the routing requirements of FIG. 1A, the CLOSarchitecture of FIG. 1B also requires multiple hops through additionalsemiconductor devices in the routing of a data packet for traversing thedata packet switch network in order to transfer packets from oneperipheral device (e.g. server A) to another peripheral device (e.g.server C). Each hop through a semiconductor suffers from dissipatingpower through the resistance physics of semiconductors and loss ofthroughput speed traversing through the semiconductor.

In contrast to conventional leaf server and spine server networkarchitectures such as those shown in FIGS. 1A and 1B, wherein amulti-tier Clos architecture is implemented which requires multiple hops(3 or greater) to switch data packets from a given input port of aconnected device (e.g. server A) to a given output port of a connecteddevice (e.g. server B), embodiments of the present disclosure provide anon-Clos network implemented as a collapsed or flattened (e.g. linear)form of network element architecture and data packet switching, whichreduces the number of hops between I/O devices, while increasing thespeed and reducing the power requirements for a given switchingoperation. Moreover, embodiments of the present disclosure integratewithin a single semiconductor switch element multiple functionalitieswhich serve to reduce power dissipation, increase switch speed, andmaintain industry standard form factor within a data networkarchitecture. Embodiments of the present disclosure integrate previouslydisparate functionalities onto or within a single semiconductor switchelement to provide forwarding and routing engine and crossbar switchingfunctionality, virtual output queuing (VOQ), and VLAN functionalitywithin a non-Clos mesh network data packet switch.

In one exemplary embodiment, there is disclosed a chassis which housesmultiple line cards or line card blades, where each line card has afaceplate with slots configured to receive a peripheral deviceconnection. Each line card may contain a semiconductor crossbarswitching element implemented as an integrated circuit or FPGA or systemon a chip and configured to route data packets through a directpoint-to-point electrical mesh interconnect. The electrical meshinterconnect directly connects I/O ports on each of the semiconductorcrossbar switching elements with every other semiconductor crossbarswitching element, whereby data packet routing is accomplished accordingto an address header on the received data packet and a lookup table ofperipheral device connections associated with the semiconductor crossbarswitching element, to thereby enable a 2 hop packet switch network. Thenetwork may be implemented as a hyperscale or compound switch.

Embodiments of the present disclosure may be implemented within achassis using rack mount line cards, or may be implemented using bladesand various form factors, with particular card configurations (e.g.horizontal, vertical, or combinations thereof), as well as differentcard/I/O numbers (e.g. N=2, 4, 8, 16, 24, 32, etc. —although powers of 2are not required and the numbers maybe be any positive integer).

As used herein in embodiments of the present disclosure, the term “hop”represents a single physical hop that includes a direct physicalconnection between two devices in a system. Similarly stated, a singlephysical hop can be defined as a traversing or routing of a data packetwhich traverses through an integrated circuit (e.g. an FPGA, microchip,ASIC, or other programmable or reprogrammable chip device) and any oneset of its transceivers or serializer/deserializer (SERDES) deviceinput(s) to its SERDES device output(s) on a switching element.

Exemplary embodiments of the present disclosure may implement a networkdata packet switch comprising line cards configured within a chassis andeach having disposed thereon (or associated therewith) a semiconductorcrossbar switch element connected with every other semiconductorcrossbar switch element with fabric module via a direct point-to-pointelectrical mesh interconnect backplane structure. In an embodiment, thebackplane structure may be semiconductor or silicon-free. In aparticular embodiment, the direct point-to-point electrical meshinterconnect backplane structure may be implemented as a printed circuitelectrical mesh interconnect. In another particular embodiment, thedirect point-to-point electrical mesh interconnect backplane structuremay be implemented as a plurality of discrete wires (e.g. micro wires ornano wires).

In further distinction to conventional leaf and spine networkarchitectures, embodiments of the present disclosure provide for asemiconductor crossbar switch element having a forwarding engineco-located on a line card with routing functionality wherebycommunications and routing into/out of the switch element (and henceline card) via a direct point-to-point electrical mesh interconnectprovides reduced SERDES and I/O gateway tolls that increase switchingthroughput or decrease switch latency, while reducing power and I/Ocomponent requirements.

According to a further implementation of the present disclosure, eachswitch element includes one or more ASICs or field programmable gatearray (FPGA) components which together with the direct point-to-pointelectrical mesh interconnect define a fabric cross bar implementation.Each switch element is associated with a line card, and each FPGA oneach line card has a direct (i.e. point-to-point) electrical connection(via the silicon-free printed circuit board backplane) with every otherFPGA on the corresponding line card.

Referring now to FIG. 1C, there is shown a simplified schematic diagramof a non-Clos network data packet network switch architecture 1000according to an embodiment of the present disclosure. As shown therein,semiconductor crossbar switch elements 1004 (data network packetswitches) labeled as L1-L5 are configured in a flattened architecture sothat data packet communication between resources (e.g. peripheraldevices identified as server A-server K) is accomplished with a reducednumber of hops. More particularly, each semiconductor switch element(e.g. L1) has a plurality of associated external I/O ports (e.g. 1004 a)for connecting with corresponding peripheral devices (e.g. server A) fortransceiving data packets. Each semiconductor switch element also has aplurality of associated internal I/O ports 1004 b. A point-to-pointelectrical mesh interconnect 1003 defines a direct electrical connectionbetween one internal I/O port of each semiconductor cross bar switchelement, and one internal I/O port of each other semiconductor cross barswitch element. A control processor 1005 is configured to maintain alookup table (LUT) 1006 mapping of peripheral device connections withcorresponding external I/O ports associated with the plurality ofsemiconductor crossbar switch elements. In response to detection of adata packet on one of its external I/O ports, semiconductor crossbarswitch element L1 determines a destination switch element (e.g. L5) forthe data packet and a destination external I/O port of the destinationsemiconductor crossbar switch element (e.g. 1004 c), according to thelookup table mapping (LUT) and based on an address header of the datapacket. On the condition that the destination semiconductor crossbarswitch element is different from the semiconductor crossbar switchelement that detected the data packet on one of its external I/O ports,that element outputs the data packet and an indicator of the destinationexternal I/O port, onto one of its internal I/O ports that is connectedto the destination semiconductor crossbar switch element via thepoint-to-point electrical mesh interconnect (e.g. 1003 a). If thedestination semiconductor crossbar switch element is the same as thesource element according to the lookup table (e.g. data packetcommunication between server A and server B), the source element (e.g.L1) outputs the data packet onto its own destination external I/O port(e.g. 1004 d) according to the lookup table mapping (without traversingthe mesh interconnect). In this manner, the matrix configuration ofsemiconductor crossbar switch element connections to one another throughthe point-to-point electrical mesh interconnect, and the re-direct I/Oconnection when the point-to-point electrical mesh interconnect is notutilized (for same destination board/same destination semiconductorswitch element I/O), provides for a system with less power dissipation,increased throughput speeds, and reduced cooling energy requirements.

On the receive or destination (ingress) side, each semiconductorcrossbar switch element (e.g. L5) is further responsive to receipt of adata packet and an indicator of the destination external I/O port at oneof its internal I/O ports. In response, the ingress semiconductorelement receives and outputs the data packet, without the indicator,onto the external I/O port identified by the indicator (e.g. 1004 c), tothe second switch-connected peripheral device (e.g. server K). In thismanner, the routing of data packets from the first switch-connectedperipheral device, to the second switch-connected peripheral devicetraverses a minimum number (only at most two) of semiconductor crossbarswitch elements or two hops.

In comparison to the multi-tier and multi-hop leaf and spine with fabricmodule architecture of FIGS. 1A and 1B, the architecture of the presentdisclosure implements data packet switching with reduced hops andincreases the routing speed or throughput within the system, reduces thepower required by traversing less transceiver or SERDES I/O chips, andresults in less heat being produced, thereby achieving substantiallyreduced electrical power consumption and associated heat (thermaloutput).

The data packets may comprise a stream of data units (e.g., datapackets, data cells, a portion of a data packet, a portion of a datacell, a header portion of the data packet, a payload portion of the datapacket, etc.) from a peripheral processing device (devices A-K). Thedata packet stream forwarded from the first switching element L1connected to the peripheral processing device A and destined to theperipheral processing device K has prepended onto it an indicator of thedestination I/O port for processing through the second crossbar switchelement L5 via the direct electrical mesh interconnect 1003 a.

Each data packet delivered to and detected by an external I/O port of asemiconductor crossbar switch element includes a header comprising anidentifier of the source peripheral processing device (e.g., an InternetProtocol (IP) address or a medium access control (MAC) address of theperipheral processing device), and an identifier of the destinationperipheral processing device (e.g., an IP address or a MAC address ofthe peripheral processing device). The egress semiconductor crossbarswitch element strips off the destination address (e.g. the destinationMAC address) and uses this address as an index to lookup table 1006. Thelookup table contains entries mapping each of the semiconductor crossbarswitch elements with I/O ports according to the point-to-pointconnectivity of electrical mesh interconnect to the internal I/O portsof each switch element, and each of the external I/O connections to eachof the known peripheral devices. The lookup table mapping provides theparticular destination (ingress) semiconductor cross bar switch elementand corresponding external I/O port of that destination element thatconnects to the destination peripheral device. The egress semiconductorcrossbar switch element then activates a corresponding one of itsinternal I/O ports that connects, via the point-to-point electrical meshinterconnect, to the corresponding (ingress) destination switch elementthat is connected to the destination peripheral device.

The egress semiconductor switch element also prepends to the data packetthe corresponding external I/O port of the destination semiconductorswitch element device to which the data packet is to be forwarded onto,based on the lookup table mapping. The internal I/O port activated atthe egress semiconductor crossbar switch element transfers the datapacket with the destination external I/O port identifier, over thedirect electrical connection mesh interconnect, to an internal I/O portof the destination (ingress) semiconductor switch element. Thisdestination semiconductor switch element reads the data packet headercontaining the prepended information of the external I/O port, discardsany extraneous header data, and routes the data packet through thisswitch and onto that port which is directly connected to the destinationperipheral device for receipt by that device.

In this manner, only at most two semiconductor switch elements aretraversed in any data packet switching between any two switch connectedperipheral devices.

Referring now to FIG. 2, in connection with FIG. 1C, there is providedan exemplary embodiment of a non-Clos data network switching apparatus200 for communicating data packets from a first switch-connectedperipheral device (e.g. device A of FIG. 1C), to a secondswitch-connected peripheral device (e.g. device K of FIG. 1C) within anEthernet architecture. In the non-limiting embodiment disclosed herein,apparatus 200 illustrates a modular architecture wherein a plurality ofcircuit boards or line cards 220 _(a), 220 _(b), . . . 220 _(n), arehoused within chassis 210. The line cards may be implemented as modularcircuit boards, each having a faceplate with I/O ports or slots forconnection with peripheral devices for transceiving data packets. It isunderstood that various types of serial links may be utilized forconnection therewith, such as Peripheral Components Interconnect/PCIe,by way of non-limiting example. The I/O communications betweenperipheral devices may be implemented as one or more of 10 Gb, 25 Gb, 40Gb, 50 Gb, 100 Gb and/or other relative data rate signal lineprocessing. As shown in FIG. 2, the integrated fabric module 215according to the present disclosure includes each of the semiconductorcrossbar switch elements 225 a, 225 b, . . . 225 n, having its externalI/O ports for connecting to peripheral devices through a correspondingline card, and its internal I/O ports connected to correspondinginternal I/O ports on every other semiconductor crossbar switch elementvia the point-to-point electrical mesh interconnect 230.

For each semiconductor crossbar switch element associated with a givenline card, a control plane includes a control micro-processor and CPUmemory in communication with a master controller 240 and address routingtable (e.g. via a separate Ethernet connection) for receiving routingtable entries and updates for transfer into each of the semiconductorswitch elements. Once received in each of the switch elements (e.g.FPGAs), each routing table gets populated into the forwarding engine foreach of the switch flow modules in each of the FPGAs.

FIGS. 2A-2D illustrate an exemplary embodiment of the schematicstructures illustrated in FIG. 1C and FIG. 2. With respect to FIGS. 2and 2A-2D, like reference numerals are used to indicate like parts. Asshown, a plurality of rack mounted modular line cards 220 a, 220 b, . .. , 220 n may be removably inserted into a corresponding slot or cavitywithin the chassis. Although shown in a horizontally stackedconfiguration with 16 line cards (i.e. 220 ₁, . . . , 220 ₁₆), it isunderstood that other configurations may be implemented.

Various cutaway views of the network switch implementation 200 having achassis 210 housing a plurality of removable line cards with integratedfabric module are depicted in FIGS. 2A, 2B, 2C, and 2D. In thisexemplary embodiment, each line card has disposed thereon asemiconductor crossbar switch element in the form of an FPGA. Each FPGAis connected to every other FPGA on a separate line card, via a verticalbackplane point-to-point electrical mesh interconnect 230. In anembodiment, the on board FPGA chips have internal I/O ports connected inpoint-to-point fashion via a silicon-free printed circuit board traceinterconnect backplane. A motherboard 240 containing a master computerprocessing unit (CPU) and lookup/routing tables provides control andcommunications via a control plane with each of the FPGAs (FIG. 3)disposed on line cards 220. Power is provided via a power control board250 containing a power supply and accompanying electronic circuits andis configured at the basis of the chassis. The power supply module mayinclude, for example, a 12 V power supply, AC/DC module, anddistribution module for supplying power to the system. A fan assembly260 is mounted at a back end of the chassis and includes a series offans positioned relative to the line cards and backplane structures soas to provide optimal cooling to the unit. The illustrated embodimentincludes a series of I/O ports on its faceplate for receiving, andoutputting signals through the line card with integrated fabricstructure in a manner that reduces the number of hops, increases speedand reduces power consumption.

In the illustrated embodiment of FIGS. 2A-2D, the point-to-pointelectrical mesh interconnect may be implemented as a plurality ofvertically oriented printed circuit boards with trace connectionselectrically connected to each of the FPGA's internal I/Os on each linecard via appropriate connector modules 232, 234 according to the desiredI/O port speed for the particular switch elements. By way ofnon-limiting example, connectors such as those manufactured by Molex mayused to provide 64 transmission line differential pairs within a givenconnector module (e.g. 10 Gb transmission).

FIG. 2D shows a more detailed view of an exemplary line card useful forimplementing an embodiment of the present disclosure. As shown, linecard 220 may be implemented in a standard 1U (1.75 inch height)configuration. In the particular implementation illustrated, faceplateslots 222 are configured to receive peripheral device connections via astandard pluggable interface for communicating data packets over thenetwork interface. Connectors 224 operate to convey the I/O data packetsdirectly from each of the faceplate terminals (lines not shown in FIG.2D) to corresponding external I/O ports (not shown in FIG. 2D) of thesemiconductor crossbar switch element disposed on circuit board 223 ofline card 220. In the illustrated embodiment, circuit board 221 providesdirect I/O connections from the faceplate to board 223 via connectors224, but is effectively available for utilization for other processingand/or networking functionality.

As described herein, a control processor is configured to maintain alookup table mapping peripheral device connections with correspondingI/O ports associated with the plurality of line cards. A crossbarswitching element (e.g. L1, L2, . . . ) is configured on each line card,where the crossbar switching element is adapted to enable electricalconnection of any one of the line card I/O ports through directpoint-to-point electrical mesh interconnect pattern (1003) whichconnects each of the plurality of line cards with every other one of theline cards, to a corresponding destination port on one of the pluralityof line access cards, in response to detection of a data packet on aningress I/O port of a given line card, and according to the lookup tablemapping based on an address header of the data packet. In this manner,transmission of data packets between input and output ports of any twoline cards and respective cross bar switch elements from source todestination occurs in only two hops.

The control plane includes a control micro-processor and CPU memory incommunication with the motherboard on each line card for transfer ofrouting table entries into each of the FPGAs. Once received in each ofthe FPGAs, the routing table gets populated into the forwarding enginefor each of the switch flow modules (FIG. 5) in each of the FPGAs. EachSFM has the forwarding engine which uses that table. In an embodiment,each SFM may have its own table. The logic that accesses that table isrepresented as the forwarding engine.

FIG. 3 is illustrative of the components of each of the semiconductorcross-bar switch elements labeled generally as 225 (FIG. 2) and disposedon a circuit board such as a line card 200 (FIG. 2) according to anexemplary embodiment of the disclosure. FIGS. 4 and 4A illustrate moredetailed representations of element 225 disposed on a line card 200,including an illustration of exemplary signal paths among components andparticular Tx/Rx communications within the fabric. Referring to FIGS. 3,4, and 4A, each semiconductor crossbar switch element includes a fieldprogrammable gate array (FPGA) 225222 disposed on a circuit boardimplemented as a line card. In the exemplary embodiment of FIG. 3, threeFPGAs 22522 a, 22522 b, and 22522 c are disposed on each line card andimplemented as routing and forwarding engine in order to route the datapacket signals (e.g. 48 lines of one or more of 10 Gb, 25 Gb, 40 Gb, 50Gb, 100 Gb). Each FPGA has its external I/O ports 22530 directlyconnected to corresponding terminals of connectors 224. Internal I/Oports 22520 are connected with every other FPGA on every other line cardvia a direct (i.e. point-to-point) electrical mesh interconnect throughconnectors 232, 234. In an embodiment, the three FPGAs shown in FIG. 3are coupled to the other FPGAs on every other line card via asemiconductor-free or silicon-free printed circuit board backplanecomprising 6 vertical printed circuit boards 230 (FIG. 2A-C) andcorresponding connectors 232, 234. Preferably, input/output channels arearranged evenly across the three integrated circuit chips or FPGAsdisposed on each of the line cards. Each chip outputs on 48 I/O linesdifferential paired, to transceiver (T/R) modules, which transmit viathe passive fabric to respective inputs. The passive fabric thusprovides a direct connection between T/R modules. By enveloping thefunctionality of the forwarding engine, crossbar switch, control plane,and point-to-point electrical mesh interconnect within an integratedfabric of the semiconductor crossbar switch element, the number of chiptraversals needed to forward a packet from one peripheral device toanother is reduced. Hence, the power costs corresponding to the numberof serial/parallel/serial conversions or SERDES traversals, areadvantageously reduced through the present architecture and processing.More particularly, as the routing and forwarding engine along with theswitching functionality is all performed within the semiconductorswitching element (e.g. silicon FPGA) and data packets communicatedbetween egress and ingress FPGAs through the point-to-point electricalmesh interconnect, significant power reduction is realized. This issignificant as each of the transceivers or SERDES on an integratedcircuit or FPGA chip dissipate about 50% of the power required. Thus, byreducing the number of hops and hence number of transceivers, along withcollapsing the switching within the geometry of the FPGA, significantpower savings are achieved.

Each FPGA has associated packet buffering functionality for regulatingnetwork traffic and mitigating network congestion and which may beimplemented as one or more DDR memory units 22550. Clock (CLK) sources22560 associated with each of the FPGAs are configured to control timingand routing of data packets, processes, and control mechanismsthroughout the chip.

In the embodiment illustrated in FIGS. 2A-2D, the vertical backplaneelectrical mesh interconnect structure is configured as having a maximumof 72 differential pairs of signals (Tx/Rx). Each semiconductor switchelement associated with each line card has 3 FPGAs per line card. Thus,48 FPGA chips may be accessed within the chassis such that, for 72differential pairs, the pathways traversing the various connectors, foreach handling 50 Gbe, corresponds to 4 TB per connector. Further,according to embodiments of the present disclosure, the communicationspaths between peripheral devices are non-optical within the apparatus.Only at the control plane where QFSP/FSP optical processing occurs,which processing is not part of the data packet forwarding path. In anembodiment, each of the printed circuit boards is via-free, with eachboard having multiple layers or laminates for processing the variousTx/Rx signal paths.

In an embodiment of the present disclosure, data packets enter the linecard with address data content and each packet is addressed by tablescontrolled and updated by the motherboard to one of the 48 outputs onthe chip. Transmission is fanned out on all three modules whilereception (over the mesh interconnect) is provided on a subset of FPGAmodules for a given line card.

In an embodiment of the disclosure, the switch element 225 is configuredto perform all of the routing and disposition on the chip such that theforwarding engine and routing engine is co-located within the switchelement on the corresponding line card 220. In this manner, ultimatepoint-to-point connection and routing over the electrical meshinterconnect provides an essentially wired communication path whichreduces the SERDES requirements for each differential pairentering/exiting the transceiver face of the line card. In the exemplaryembodiment, the circuit board or line card is composed of multipledifferent routing layers of separate transmit and receive layers.Similarly, in one embodiment, the electrical mesh interconnect embodiedin one or more printed circuit boards contains corresponding multiplelaminate layers for signal transmit and receive functionality.

FIG. 4 illustrates operation of the switch element in connection withthe point-to-point electrical mesh interconnect showing select signalline connections 22535 (internal I/O ports) for activation andforwarding of data packets on the egress side of FPGA 22522 c. Alsoillustrated are switch element I/O signal line connections 22537(external I/O ports) to select terminals 224 for connection with theperipheral devices for each of FPGAs 22522 a-c.

FIG. 4A illustrates operation of the switch element in connection withthe point-to-point electrical mesh interconnect showing select signalline connections 22536 (internal I/O ports) for receiving data packetsat the ingress side of each FPGA 22522 a-c. Also illustrated are switchelement I/O signal line connections 22538 (external I/O ports) to selectterminals 224 for connection with the peripheral devices for each ofFPGAs 22522 a-c.

FIG. 7A is an exemplary illustration of the point-to-point electricalmesh interconnect structure for providing direct connection betweenintegrated circuits on a plurality of line cards for data packettransfer according to an embodiment of the present disclosure. Eachterminal connection provides a separate routing path for a differentialpair connection associated with 16 line cards/switch elements.

FIG. 7B is an exemplary illustration of the point-to-point electricalmesh interconnect structure showing select signal communication lines 22for providing direct connection between semiconductor switch elementsdisposed on line cards for data packet transfer according to anembodiment of the present disclosure. As can be seen, select connectorpaths for the differential pairs are fixedly established betweeninternal I/O terminals according to FPGA I/O arrangement and line cardidentification. As shown, the connection within a given layer of themesh interconnect shows signal path connectivity between line cards 1,and 14, 15, and 16, by way of non-limiting example.

Referring again to FIGS. 2-4, in an embodiment of the disclosure acontrol plane on each switch element associated with each line cardcomprises an internal Ethernet network in communication with themotherboard for communicating independently with each line card/switchelement. Communication is accomplished by the control plane sending toeach of the line cards their routing table(s) to establish each linecard's configuration as to where to send data packets on the network. Inan embodiment, a plurality of QSFP ports (e.g. 2 bidirectional QSFPports from mother board to each line card—2 per line card) provides forn=16 QSFP control signals and 16 line cards within the system in orderto provide point-to-point control via the Ethernet within the system. Itis understood that other numbers of line cards and/or control signal maybe implemented, such as n=4, n=16, or n=32 line cards, by way ofnon-limiting example. Furthermore, modulation schemes such as PAM-4,QAM, QPSK may be implemented within the system as is understood by oneof ordinary skill in the art. A processor such as an Intel Pentium orXilinx processor or other such microprocessor device is configured onthe control plane for controlling the routing through the network. Thecontrol plane operates to constantly read source device addresses (e.g.source MAC address) for devices to add to and/or update the table ofconnections within the system. As will be understood, for each FPGA,each port is numbered and associated with line card, FPGA, andpoint-to-point fabric electrical interconnect and is known a priori.Because MAC addresses are required to decay at periodic intervals (e.g.5 sec.), in order that a new device may connect to the network (or anexisting device may be maintained), the control plane is constantlyresponsive to such device broadcasts and reads, updates, and downloadsfrom its master table within the management plane, the mapping table inorder to provide refreshed look up tables for each of the semiconductorswitch elements associated with each line card. Accordingly, the systemlearns via the source MAC address of each peripheral device its relativelocation on the network and based on a received destination MAC address,operates to obtain the destination location (e.g. line card number, FPGAnumber, output port) connected thereto and provide the requisite outputport for transferring the data packet. Depending on the type bitsreceived, the system is operable to index down into the payload in orderto retrieve the address (e.g. VXLAN processing). Once the process iscomplete and the LUT provides the destination output port, thesemiconductor crossbar switch element forwards the data packet alongwith the requisite destination output port number via the electricalmesh interconnect, thereby consolidating the forwarding engine into theswitching engine.

As discussed hereinabove, an embodiment of the present disclosureprovides for an internal network such as an Ethernet network linking themotherboard or master control to all of the line cards in the chassis. Asecond internal Ethernet network is disposed on each line card and linksall of the FPGAs on each line card to the control microprocessor (e.g.22500). Thus, the master lookup table is populated (at the motherboard)and updated with requisite peripheral device connections and flowcontrol is provided to each of the lookup tables on each of the N linecards via N separate parallel Ethernet channels to enable simultaneouswrites/updates of the respective tables on each line card. Themicroprocessor on each line card then sends out the updated tables toeach FPGA to enable routing. In an embodiment, the microprocessor oneach chip may be an ARM processor operable to execute at a 10G line rateto enable efficient table access and updates (e.g. 3.33 GHz). In anembodiment, the master controller CPU on the motherboard through thenetwork operating system writes the look up tables onto each of the linecard/semiconductor switch elements and calls a device driver to modify aprogrammable chip.

The block diagram of FIG. 5 shows an exemplary internal FPGAarchitecture of a hyper scale switch slice (HSS) according to anembodiment of the present disclosure. The HSS diagram effectivelyequates to a single FPGA, with 2 or, in the present embodiment, 3 FPGAsin a line card. As discussed above, in an exemplary embodiment, 16 linecards are provided in a hyper scale switch chassis and dependent uponthe line card capability as a 10/25/40/100G Ethernet functionality. Inan exemplary embodiment, a plurality of Switch Flow Modules (SFM) (e.g.58) are provided for a 10 Gb switch element with integrated fabric.Other numbers of instantiations may be required for different rateswitching (e.g. 40 Gb). The SFM contains the ingress/egress FIFO,Routing Lookup Table and sequencers. The SFM is triggered and commencespacket transfers in reaction to the Ethernet MAC core control signalsemanating from the core microprocessor (e.g. Xilinx Ethernet Core) onthe I/O side indicating reception of a packet. An additional way thattriggers will occur are for packets that come in via the transceiver tocause the sequencer/scheduler to initiate a request to transfer to theappropriate egress port via a cut through flow. The appropriate port isdetermined from the router lookup table and hash function to reduce theaddress space. The egress stage of the SFM grants requests through anarbiter that resolves simultaneous requests from multiple SFMs. A UserInterface AXI Bus manages the router and lookup tables. Quality ofService (QoS) prioritization is implemented in accordance with Ethernetrules.

Referring again to FIG. 3 in conjunction with FIG. 5, overflowprocessing (DDR) is provided so as to divert packets that cannot beforwarded due to contention to a buffer for subsequent processing. Inaddition, the integrated semiconductor switch and fabric according to anembodiment of the present disclosure comprises a plurality of n layersor laminates (e.g. n=16) to facilitate the volume of signal connectionsand direct point-to-point connections within the system.

Referring now to FIG. 6 in conjunction with FIG. 5, there is disclosed aprocess flow illustrating steps for sending a data packet through asemiconductor crossbar switch element and electrical mesh interconnectaccording to an embodiment of the present disclosure.

In an exemplary embodiment, the FPGA architecture of FIG. 5 is embodiedas a set of N (e.g. N=26) switch flow software modules or components(SFMs) designated as I/O SFMs (labeled 501, 502, . . . , 526), and M(e.g. M=32) SFMs designated as direct connect SFMs (labeled 601, 602, .. . , 632). The direct connect SFMs each have direct connections to theelectrical mesh network 230 for packet transport. In an embodiment, eachof the I/O SFMs of each FPGA can accept requests from both directconnect SFM modules as well as I/O SFM modules of the FPGA. Directconnect SFM modules are configured so as not to be able to send requeststo other direct connect SFMS. In an embodiment, each FPGA's SFM digitallogic circuit modules and functionality and behavior may be implementedin hardware description language (HDL) useful in describing thestructure and behavior of those modules depicted in FIG. 5.

Within the FPGA architecture shown in FIG. 5, a data packet is receivedat SERDES 1 and communicated to MAC processing module 10 which providesup-chip or down-chip processing (depending on ingress/egress flow) (FIG.6 block 610). Processing module 10 operates to decrease/increase thepacket bit number by N bits (e.g. N=2 from/to 66 bits to/from 64 bits)to address parity as part of the input/output processing and timing orre-timing functionality associated with the switch flow modules of FIG.5. In this manner, communication channel noise is mitigated by strippingoff 2 bits upon entry into the SFM and adding 2 bits upon exit.

Processing proceeds to SFM sequencer module 20 (e.g. VLAN processing)within the SFM FPGA architecture. Sequencer module 20 (e.g. of SFM A)operates to strip off the MAC source address and destination addressfrom the incoming packet (FIG. 6 block 620). The source address isutilized by the system for learning (e.g. MAC learning) to determinewhat devices are connected to the network at particular ports and updatethe master table and corresponding downstream tables for each of theline cards. On the condition that a device address is not in the lookuptable, the system forwards to the microprocessor for delivery to themotherboard for formulation into each of the lookup tables. Thedestination address is used as an index into the lookup table (LUT) 30on SFM A in order to determine the output port to route the packet to.This may be implemented via random access memory (RAM) or other on chipstorage media. FIG. 6C is an example showing a LUT wherein an 11 bitfield is stored, including 4 bits for line card identification (e.g.line card 1-16), 2 bits for FPGA identification on the line card (e.g.FPGA 1-3), and 5 bits of I/O in order to map to 32 different I/O ports.A 2 bit field identifying whether the packet at that particular FPGA isto be routed via the direct electrical mesh interconnect structure orwhether the routing pathway is merely internal to the particular FPGAand/or line card associated with that FPGA (and therefore not sent viathe electrical mesh fabric interconnect) may also be provided. Undersuch condition (i.e. 11 (PSP)) the route path would not pass via thedirect electrical mesh interconnect structure (e.g. source anddestination on one of the FPGA's on the same line card).

Referring again to FIG. 5 in conjunction with FIG. 6, the sequencermodule utilizes the MAC address as an index to find the mapping locationand raise a request (module 40) to a corresponding SFM (e.g. SFM B) thatis connected to the determined line card and the determined FPGA on theline card, in order to route the data packet to the appropriatedestination (FIG. 6 block 630). Arbiter module 50 (SFM B) receives therequest from I/O SFM A (as well as any other SFM requesting packettransfer) and selects the particular request for processing (FIG. 6block 640). Upon grant of the request, the packet is transported via thecross-bar multiplexer (MUX) arrangement 60-65 for downstream processing(FIG. 6 block 650).

Upon grant of the request, the queued data packet in buffer 70 (ingressFIFO) is transferred via MUX units 60, 65 to the egress FIFO (e.g.module 68) on direct connect SFM B. In an embodiment, the SFMs 601-632are configured to accept both 10G and 40G pathways via their respectiveegress FIFO queues (68,69) which are prioritized according to thequality of service (QOS) processing module 71 and QOS FIFO queue module72 (FIG. 6 block 660). The QOS module interacts with the VLAN processingto select and sequence via MUX 74 packets among the different processflows (e.g. 10G, 40G) according to priority requirements to transmit thepackets in the FIFOs (along with the prepended I/O port number) out ontothe electrical mesh interconnect 230 (FIG. 6 block 670). It is to beunderstood that MUX 74 performs priority swap or selection according tothe priority of service whereby packets and their priorities are linkedaccording to the queues (i.e. next in line processing) 72 and stagingFIFO (e.g. I/O FIFO) 76.

In one embodiment, the FIFO operates to enable different data rates(10G/40G) to proceed through the FPGA by means of skewing/de-skewing thedata rates by via input to the FIFO at a first rate and output from theFIFO at a different rate, as is understood by one of ordinary skill.

Still referring to FIG. 5, in conjunction with FIG. 6, once the datapacket exits the initial FPGA at SERDES 4 (flow 560), it traverses theelectrical mesh interconnect 230 which routes the packet to thedestination FPGA. As shown in FIG. 5, at the destination FPGA, thesequencer 580 (via flow 570) receives the packet and correlates the portnumber prepended on the packet in port number queue 582 with the packetnumber staging and routing of the packet to the destination port address(FIG. 6 block 680). As previously described, request processing andcommunication onto the particular port associated with the FPGA via theparticular SFM is made through the crossbar (e.g. flow 585) whichproceeds through the respective SFM (501-526) to the output port (e.g.flow 586) for receipt by the port connected peripheral destinationdevice (FIG. 6 block 690). As shown, flow arrows identified as AA, BB,and CC represent data packet flows through the crossbar (e.g. at 22 GBrate), with flow arrows identified as AA being at a rate of about 22 GB,and arrows BB and CC representing data packet rates of 10 GB and 40 GBrates, respectively.

The FPGA architecture further includes overflow processing SFMs (e.g. 6instantiations) to alleviate throughput bottlenecks. As shown, in theevent of a significant blockage of data flow, a request is made todeposit the overflow packets to an external repository 804 via flow 802.Overflow packets may be retrieved from DDR (e.g. DDR4) FIFO 804 via flow806.

In one embodiment, in the event that the packet request is denied,processing proceeds to the next packet in the queue for a request.Processing of that next packet then proceeds as outlined hereinabove. Inthe event that the request is granted and processing of that next packetproceeds to its destination port, then a new request is made for thepreviously denied packet. Otherwise, in the event of a second denial,processing proceeds to the next packet in the queue for a request. Asthe denial of service request provides for multiple (e.g. three deep)sequential packet requests, if the third packet in line gets denied,processing reverts back to the first packet for making a new request.

Thus, there is disclosed a non-Clos data network switching apparatus forcommunicating data packets from a first switch-connected peripheraldevice, to a second switch-connected peripheral device, the apparatuscomprising a chassis; a plurality of line cards housed within thechassis and having I/O ports for transceiving data packets; a controlprocessor configured to maintain a lookup table mapping peripheraldevice connections with corresponding I/O ports associated with theplurality of line cards, a crossbar switching element on each line card,the crossbar switching element configured to enable electricalconnection of any one of the line card I/O ports through directpoint-to-point electrical mesh interconnect pattern which connects eachof the plurality of line cards with every other one of the line cards,to a corresponding destination port on one of the plurality of lineaccess cards, in response to detection of a data packet on an ingressI/O port of a given line card, and according to the lookup table mappingbased on an address header of the data packet, whereby transmission ofpackets between input and output ports of any two line cards andrespective cross bar switch elements occurs in only two hops.

The embodiments are provided by way of example only, and otherembodiments for implementing the systems and methods described hereinmay be contemplated by one of skill in the pertinent art withoutdeparting from the intended scope of this disclosure. For example,although embodiments disclose a data packet network architecture,apparatus, device, and/or method that implements the semiconductorcrossbar switch element onto or associated with a given line card, suchconfiguration is not essential to the practice of the disclosure, assuch switch elements may be implemented in or onto other substrates,such as a backplane (or midplane), by way of non-limiting example.Further, although embodiments of the present disclosure illustrate aprinted circuit electrical mesh interconnect, and connected in aninterleaved backplane structure (relative to the line card/switchelement configuration) such configuration is an advantageous embodimentbut is not essential to the practice of the disclosure, as suchelectrical mesh interconnect may be implemented via other means, such asdirect wire connection with no backplane or printed circuit board),and/or via other non-backplane structure (e.g. on a line card). In anembodiment, discrete wires such as micro coaxial or twinaxial cables,twisted pairs, or other direct electrical wire connections may be madewith the internal I/O ports of each of the FPGAs through connectors andmicro wire cables such as those provided for high speed interconnects.Modification may be made for pigtails for cable ready applications.

Still further, implementation of the present disclosure may be made tovirtual switches within a data center or other segmentedsoftware-controlled data packet switching circuit. In such virtual datapacket switched systems, the form of a plurality of semiconductorcrossbar switch elements interconnected via a direct point-to-pointelectrical mesh interconnect with integrated switching, forwarding androuting functionality embedded into each crossbar switch, may besubstituted for the prior art (e.g. Clos network) implementations, inorder to reduce hops, decrease power dissipation and usage, and enableexecution on a high performance computer server to provide for virtualsegmentation, securitization, and reconfiguration. The semiconductorcrossbar switch elements may be configured as virtual switches within avirtual machine (VM) for providing routing using MAC address header andlookup table mapping of configuration elements. As overlay networkclients or VMs, require gateways to provide routing functionality, thepresent disclosure enables OSI layer 2 or layer 3 switching forredirecting data message traffic, using the destination Media AccessControl (MAC) address and logical sublayers to establish initialconnection, parse the output data into data frames, and address receiptacknowledgments and/or queue processing when data arrives successfullyor alternatively, processing is denied.

By way of further example, processing systems described herein mayinclude memory containing data, which may include instructions, theinstructions when executed by a processor or multiple processors, causethe steps of a method for performing the operations set forth herein.

While the foregoing invention has been described with reference to theabove-described embodiments, various additional modifications andchanges can be made without departing from the spirit of the invention.Accordingly, all such modifications and changes are considered to bewithin the scope of the appended claims. Accordingly, the specificationand the drawings are to be regarded in an illustrative rather than arestrictive sense. The accompanying drawings that form a part hereof,show by way of illustration, and not of limitation, specific embodimentsin which the subject matter may be practiced. The embodimentsillustrated are described in sufficient detail to enable those skilledin the art to practice the teachings disclosed herein. Other embodimentsmay be utilized and derived therefrom, such that structural and logicalsubstitutions and changes may be made without departing from the scopeof this disclosure. This Detailed Description, therefore, is not to betaken in a limiting sense, and the scope of various embodiments isdefined only by the appended claims, along with the full range ofequivalents to which such claims are entitled.

Such embodiments of the inventive subject matter may be referred toherein, individually and/or collectively, by the term “invention” merelyfor convenience and without intending to voluntarily limit the scope ofthis application to any single invention or inventive concept if morethan one is in fact disclosed. Thus, although specific embodiments havebeen illustrated and described herein, it should be appreciated that anyarrangement calculated to achieve the same purpose may be substitutedfor the specific embodiments shown. This disclosure is intended to coverany and all adaptations of variations of various embodiments.Combinations of the above embodiments, and other embodiments notspecifically described herein, will be apparent to those of skill in theart upon reviewing the above description.

What is claimed is:
 1. A non-Clos network data packet switch comprising:a plurality of semiconductor crossbar switch elements connected to oneanother according to a direct point-to-point electrical meshinterconnect; wherein data packets from a first peripheral deviceconnected to one of said semiconductor crossbar switch elements arecommunicated to a second peripheral device connected to another one ofsaid semiconductor crossbar switch elements via said directpoint-to-point electrical mesh interconnect according to a lookup tableand network device addressing for reduced switching power.
 2. The switchof claim 1, wherein the lookup table includes a mapping of peripheraldevice connections with corresponding I/O ports associated with saidplurality of semiconductor switch elements according to the point topoint connectivity of the electrical mesh interconnect to correspondingI/O ports of each semiconductor switch element.
 3. The switch of claim2, wherein a destination semiconductor crossbar switch element anddestination I/O port is determined according to the lookup table mappingusing said data indicative of the destination address as an index to thelookup table.
 4. The switch according to claim 1, wherein each of theplurality of semiconductor switch elements includes a control planehaving a processor and memory in communication with a master controllerand address routing table for receiving routing table entries andupdates for transfer into each of the semiconductor switch elements. 5.The switch of claim 1, wherein each of said plurality of semiconductorswitch elements includes at least one field programmable gate array(FPGA).
 6. The switch of claim 1, wherein the point-to-point electricalmesh interconnect is comprised of at least one multi-layer stack ofelectrically interconnected printed circuit boards.
 7. The switch ofclaim 6, wherein the at least one multi-layer stack of electricallyinterconnected printed circuit boards is silicon-free.
 8. The switch ofclaim 1, wherein said data packets comprise header information includingdata indicative of a source address of the source peripheral device andof a destination address of the destination peripheral device.
 9. Theswitch of claim 8, wherein the header information includes one of a MACaddress and an IP address, and wherein the lookup table stores one ofMAC addresses and IP addresses corresponding to connected peripheraldevices.
 10. The switch of claim 1, wherein one or more of saidsemiconductor switch elements is configurable for one of 10 Gb, 25 Gb,40 Gb, 50 Gb, and 100 Gb signal line processing.
 11. A method ofcommunicating data packets in a non-Clos data packet switching network,comprising: connecting a plurality of semiconductor crossbar switchelements to one another according to a direct point-to-point electricalmesh interconnect; and transferring data packets from a first peripheraldevice connected to one of said semiconductor crossbar switch elementsto a second peripheral device connected to another one of saidsemiconductor crossbar switch elements via said direct point-to-pointelectrical mesh interconnect, according to a lookup table and networkdevice addressing; wherein said lookup table includes mapping of each ofthe plurality of semiconductor switch elements I/O ports to another oneof said plurality of semiconductor switch elements I/O ports, accordingto the point to point connectivity of the electrical mesh interconnect.12. The method of claim 11, wherein said mapping of values in saidlookup table is performed according to a hash function.
 13. The methodof claim 11, wherein the data packets are transferred from the firstperipheral to the second peripheral device by said first semiconductorcrossbar switch element prepending an indicator of the destination I/Oport of the second semiconductor crossbar switch element and outputtingthe data packet and said indicator onto said direct point-to-pointelectrical mesh interconnect for transfer to said second semiconductorcrossbar switch element.
 14. The method of claim 13, further comprisingdiscarding header data prior to said output of the data packet onto thesecond semiconductor switch element destination I/O port identified. 15.The method of claim 11, further comprising diverting packets to a bufferfor subsequent processing when said packets cannot be forwarded due tocontention within one of said semiconductor switch elements.
 16. Themethod of claim 11, further comprising, on the condition that a deviceaddress is not in the lookup table, forwarding said device address forupdating into a master table.